Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce whisper decoder file size with onnx export #328

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Sep 20, 2023

Before this pull-request:

-rw-r--r--  1 fangjun  staff   105M Aug  7 16:22 tiny.en-decoder.int8.onnx
-rw-r--r--  1 fangjun  staff   185M Sep 20 19:13 tiny.en-decoder.onnx

With this pull-request:

-rw-r--r--  1 fangjun  staff    86M Sep 20 19:09 tiny.en-decoder.int8.onnx
-rw-r--r--  1 fangjun  staff   109M Sep 20 19:09 tiny.en-decoder.onnx

It turns out onnx saves a transposed version of self.textDecoder.token_embedding.weight at the output when computing logits. This PR removes the transposed version to reduce the file size.


https://github.com/openai/whisper/#available-models-and-languages
says tiny.en has 39 M parameters, whose file size is about 39 * 1e6 * 4 /1024/1024 = 148.77 MB.

Our exported ONNX model file sizes for float32 are

-rw-r--r--  1 fangjun  staff   109M Sep 20 19:31 tiny.en-decoder.onnx
-rw-r--r--  1 fangjun  staff    36M Aug  7 16:22 tiny.en-encoder.onnx

109 + 36 = 145 MB, which matches the expected file size 148.77 MB.

@csukuangfj csukuangfj merged commit f5c060d into k2-fsa:master Sep 20, 2023
99 of 109 checks passed
@csukuangfj csukuangfj deleted the fix-whisper branch September 20, 2023 11:33
@csukuangfj
Copy link
Collaborator Author

Please re-download the whisper onnx model or re-export it with the latest code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant